This page last changed on Jan 03, 2009 by iank@bearcave.com.
The Material Below is Incorrect

A lab notebook should record the steps taken to arrive at a result. Unfortunately, some of these steps will go in the wrong direction. This page is just such a case.

Return is usually calculated using the formula return = pricet - pricet-1. There are variations on this, like log return. The signal below, signal = short200 - long1200, is a proxy for return. The short200 average will be close to the actual time series and the long1200 will lag the time series. The difference is a sign reversed return, since a later average is being substracted from a more recent average.

As the graphs below show, this pseudo return falls into a curve, at least when looked at over multiple days. This curve is not the same as the distribution of the minimum and maximum of the signal. The text below states this, but this is not correct. What is needed to reproduce the visual algorithm are these minimum and maximum values. Then a line is plotted through the minimum values. The question is, now to plot this line.

The problem is, the minimum values may be slightly off of each other. What we want is something like the minimum line that intersects one to two values a day (on average), where the line has a slope of zero (e.g., its a flat line) and minimizes the distance to the minimums.

The Simple Moving Average model creates a signal from the difference of a short window (of 200 trade ticks) and a long window (of 1200 trade ticks). That is When the signal reaches a minimum value, a block of stock is bought. When Ed developed the Excel models he found the buy signal buy drawing a line through a set of minima for several days of signal. In theory this manually finds points on the far end of the signal distribution. If a buy is made at these points, there will be reversion to the mean and the model will make money, on average.

Finding the signal minima manually is not practical for more than a few stocks, so a method is needed to find the minima via a software (Java) algorithm. One way this can be done is to calculate a histogram for the signal values. The elements of the histogram are created from the signal values (rounded to three decimal places) and the count of the number of times the signal had that value. The algorithm starts at the left end (negative end) of the histogram and moves to the right until it finds a histogram "bucket" that has at least one value for each day of data that went into the signal data. So if there are five days of signal data, the algorithm will try to find a histogram bucket with five values.

A histogram for the stock GS (Goldman-Sachs), from four days of market data (June 30, July 1, July 2 and July 3, 2008) is shown below. The y-axis is the frequency (e.g., number of values per bucket).

The GS histogram is relatively smooth. However, the histogram for CME is not as evenly distributed.

The algorithm ended up picking up the bucket near -11, which may represent an outlier. The problem with the CME distribution is that it has lots of spikes. If these were smoothed out, a value nearer the center of the distribution would be chosen and it's not clear if this is correct.

The parameters that were calculated form the histogram distributions (using June 30, July 1, July 2 and July 3, 2008) are:

stock parameter
GOOG -5.101
GS -1.465
CME -11.704
FCX -1.511
OIH -3.384
RIG -1.689
AMGN -0.26
BIIB -0.636
CMI -2.299
ERTS -0.729
ICE -3.345
CAT -1.227
BBBY -0.419
BRCM -0.88
GENZ -0.665
AMZN -1.787

Unfortunately, these parameters did not result in profitable (paper) trading. Trading on July 7, 2008 with 1000 share orders yielded the results shown below:

The debug trade from the Trade Engine is included below:

TWS Time at connection:20080707 05:28:14 GMT-08:00
Connection to TWS succeeded
nextValidId: Next Valid Order ID: 480

error: -1 | 2104 | Market data farm connection is OK:usfarm
480: BUYING with a market order GS at 2008-07-07 09:49:33 305
480: Fill price for GS: 178.02 at 2008-07-07 09:49:40 581
480: For GS MAD = 0.125, MAD * 3.0 = 0.375 at 2008-07-07 09:49:40 581
480: Issuing stop order for GS at price 177.65 at 2008-07-07 09:49:40 581
error: 481 | 202 | Order Canceled - reason:
482: Hit convex signal turn for GS Fill price = 177.98 at 2008-07-07 09:49:43 603

483: BUYING with a market order AMGN at 2008-07-07 10:06:43 821
483: Fill price for AMGN: 50.06 at 2008-07-07 10:06:59 131
483: For AMGN MAD = 0.05999999999999517, MAD * 3.0 = 0.1799999999999855 at 2008-07-07 10:06:59 131
483: Issuing stop order for AMGN at price 49.88 at 2008-07-07 10:06:59 131
error: 484 | 202 | Order Canceled - reason:
485: Hit convex signal turn for AMGN Fill price = 50.04 at 2008-07-07 10:07:15 430

486: BUYING with a market order FCX at 2008-07-07 12:24:08 304
486: Fill price for FCX: 109.46 at 2008-07-07 12:24:13 074
486: For FCX MAD = 0.2750000000000057, MAD * 3.0 = 0.825000000000017 at 2008-07-07 12:24:13 074
486: Issuing stop order for FCX at price 108.63 at 2008-07-07 12:24:13 074
error: 487 | 202 | Order Canceled - reason:
488: Hit convex signal turn for FCX Fill price = 108.84 at 2008-07-07 12:29:31 348

489: BUYING with a market order AMGN at 2008-07-07 13:06:25 693
489: Fill price for AMGN: 50.02 at 2008-07-07 13:06:28 710
489: For AMGN MAD = 0.00999999999999801, MAD * 3.0 = 0.02999999999999403 at 2008-07-07 13:06:28 710
489: Issuing stop order for AMGN at price 49.99 at 2008-07-07 13:06:28 710
490: Stop order hit for AMGN at 49.98 at 2008-07-07 13:09:04 562

Parameters for July 8 trading using tick data from June 30, July 1, July 2 , July 3, and July 7, 2008:

stock parameter
GOOG -5.088
GS -1.48
CME -9.729
FCX -1.514
OIH -3.384
RIG -1.689
AMGN -0.467
BIIB -0.636
CMI -2.299
ERTS -0.709
ICE -3.345
CAT -1.227
BBBY -0.419
BRCM -0.522
GENZ -0.785
AMZN -1.787

Unfortunately, these (slightly different) parameters did no better trading on July 8 than the parameters for July 7 did. The trading trace is shown below. Note that only GS and FCX ever even hit the signal.

nextValidId: Next Valid Order ID: 491
error: -1 | 2104 | Market data farm connection is OK:usfarm
491: BUYING with a market order GS at 2008-07-08 09:48:15 077
491: Fill price for GS: 170.97 at 2008-07-08 09:48:17 578
491: For GS MAD = 0.07999999999998408, MAD * 3.0 = 0.23999999999995225 at 2008-07-08 09:48:17 578
491: Issuing stop order for GS at price 170.73 at 2008-07-08 09:48:17 578
492: Stop order hit for GS at 170.66 at 2008-07-08 09:48:46 359

493: BUYING with a market order FCX at 2008-07-08 10:00:32 251
493: Fill price for FCX: 103.15 at 2008-07-08 10:00:35 507
493: For FCX MAD = 0.269999999999996, MAD * 3.0 = 0.8099999999999881 at 2008-07-08 10:00:35 507
493: Issuing stop order for FCX at price 102.34 at 2008-07-08 10:00:35 508
error: 494 | 202 | Order Canceled - reason:
495: Hit convex signal turn for FCX Fill price = 103.12 at 2008-07-08 10:00:43 017

The Algorithm Doesn't do what the Graphical Method Does

The algorithm calculates the signal values for each day separately. It then puts all of these signal values into a single distribution (that consists of the signal values for all of the days). Starting at the far left (the negative side) the algorithm finds the histogram bucket that has a count of at least N values (where N is the number of days in the distribution). This is the value used for the trigger value.

If, for example, one day was unusually volatile, there could be N instances (where N is the number of days in the distribution) in that day that are at the low end of the total distribution. As the daily time window moves forward, this bucket may continue to be picked up. These values were all from one day, so they don't represent the other days in the distribution.

The graphical technique that has been used to hand parameterize the models will tend to pick a trigger value that is distributed throughout the days in the distribution since a line is picked that is a minima through all of the days. This is not the same as the algorithm described above.

The distributions for four days, June 30, July 1, July 3 and July 7 are shown below. If these histograms are combined (as happens in the algorithm used to calculate parameters above) and a histogram bucket with four (or five) elements is chosen then the parameter will be around -1.5.

If a histogram value is chosen that exists in all of the distributions, the value will be around -1.2, since July 1 does not have as large a negative range as the other days. This algorithm is closer to the "by hand" graphical algorithm. The problem with visual "by hand" algorithms is that they may be adjusted in ways that follow visual rules, not algorithmic rules, which makes the algorithm impossible to reproduce.


Moving average (Short200 - Long1200) of FCX, June 30, 2008

Moving average (Short200 - Long1200) of FCX, July 1, 2008

Moving average (Short200 - Long1200) of FCX, July 2, 2008

Moving average (Short200 - Long1200) of FCX, July 7, 2008

My Problems got Problems

Reproducing the visual algorithm is more difficult than is implied above. This means that even when the ranges are aligned, the algorithm cannot immediately find a matching value. Nor is it clear without back testing that this will even be an effective technique. While I hate to give up on something, it's not clear how to write an algorithm to effectively parameterize this model. So I am going to concentrate on the EMA models and hope for something better.